Sparse and Truncated Suffix Trees on Variable-Length Codes

نویسندگان

  • Takashi Uemura
  • Hiroki Arimura
چکیده

The sparse suffix trees (SST), introduced by (Kärkkäinen and Ukkonen, COCOON 1996), is the suffix tree for a subset of all suffixes of an input text T of length n. In this paper, we study a special case that an input string is a sequence of codewords drawn from a regular prefix code ∆ ⊆ Σ recognized by a finite automaton, and index points locate on the code boundaries. In this case, we present an online algorithm that constructs the sparse suffix tree for an input string t on any variable-length regular prefix code, called the code suffix tree (CST), in O(n+m) time and O(k) additional space for a fixed base alphabet Σ, where m is the size of an automaton for ∆. Furthermore, we present a modified algorithm for k-truncated version of code suffix trees that runs in the same time and space complexities. Hence, these results generalize the previous results (Inenaga and Takeda, CPM 2006) for word suffix trees and (Na, Apostolico, Iliopoulos, and Park, Theor. Comp. Sci., 304, 2003) for truncated suffix trees on arbitrary variable-length regular prefix codes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Directly Addressable Variable-Length Codes

We introduce a symbol reordering technique that implicitly synchronizes variable-length codes, such that it is possible to directly access the i-th codeword without need of any sampling method. The technique is practical and has many applications to the representation of ordered sets, sparse bitmaps, partial sums, and compressed data structures for suffix trees, arrays, and inverted indexes, to...

متن کامل

Improved Variable-to-Fixed Length Codes

Though many compression methods are based on the use of variable length codes, there has recently been a trend to search for alternatives in which the lengths of the codewords are more restricted, which can be useful for fast decoding and compressed searches. This paper explores the construction of variable-to-fixed length codes, which have been suggested long ago by Tunstall. Using a new heuri...

متن کامل

On improving Tunstall codes

Though many compression methods are based on the use of variable length codes, there has recently been a trend to search for alternatives in which the lengths of the codewords are more restricted, which can be useful for fast decoding and compressed searches. This paper explores the construction of variable-to-fixed length codes, which have been suggested long ago by Tunstall. Using new heurist...

متن کامل

Sparse compact directed acyclic word graphs

The suffix tree of string w represents all suffixes of w, and thus it supports full indexing of w for exact pattern matching. On the other hand, a sparse suffix tree of w represents only a subset of the suffixes of w, and therefore it supports sparse indexing of w. There has been a wide range of applications of sparse suffix trees, e.g., natural language processing and biological sequence analy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011